Word-length Entropies and Correlations of Natural Language Written Texts
نویسندگان
چکیده
منابع مشابه
Word-length entropies and correlations of natural language written texts
We study the frequency distributions and correlations of the word lengths of ten European languages. Our findings indicate that a) the word-length distribution of short words quantified by the mean value and the entropy distinguishes the Uralic (Finnish) corpus from the others, b) the tails at long words, manifested in the high-order moments of the distributions, differentiate the Germanic lang...
متن کاملEntropy analysis of word-length series of natural language texts: Effects of text language and genre
We estimate the n-gram entropies of natural language texts in word-length representation and find that these are sensitive to text language and genre. We attribute this sensitivity to changes in the probability distribution of the lengths of single words and emphasize the crucial role of the uniformity of probabilities of having words with length between five and ten. Furthermore, comparison wi...
متن کاملWord-Length Correlations and Memory in Large Texts: A Visibility Network Analysis
We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org) using the natural visibility graph method (NVG). NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, ...
متن کاملDistinct word length frequencies: distributions and symbol entropies
The distribution of frequency counts of distinct words by length in a language’s vocabulary will be analyzed using two methods. The first, will look at the empirical distributions of several languages and derive a distribution that reasonably explains the number of distinct words as a function of length. We will be able to derive the frequency count, mean word length, and variance of word lengt...
متن کاملA framework for conflict analysis of normative texts written in controlled natural language
In this paper we are concerned with the analysis of normative conflicts, or the detection of conflicting obligations, permissions and prohibitions in normative texts written in a Controlled Natural Language (CNL). For this we present AnaCon, a proof-of-concept system where normative texts written in CNL are automatically translated into the formal language CL using the Grammatical Framework (GF...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Quantitative Linguistics
سال: 2015
ISSN: 0929-6174,1744-5035
DOI: 10.1080/09296174.2014.1001636